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1 Introduction 


Tourism has grown an integral contributor to economics for various countries. Tourist as a goods consumer 
and a services beneficiary, the huge demands bring the numerous benefits and advantages to the host coun- 
try. Tourism stimulate the economic growth from different aspects, including directly boosting economic 
units from the tourism industry (housing, food, transportation, etc), creating new employment opportunities, 
bringing important revenues to the State budget in the form of taxes and fees and enhancing the develop- 
ments of other sectors engaging in the accomplishment of the tourism product 2016). According 
to the statistical data from the WTO data for Tourism Sector including passenger Transportation Services 
(excluding freight), Travel Services, and the recreation portion of the Other Commercial Services sector 
(2008), the proportion of the Tourism Sector is the sixth largest sector of the global economy and 
the largest Service Sector industry in the world (Lew) 2011). Furthermore, the impacts of the development 


of tourism on the industrial production has been drawn great attentions. |Copeland| (1991) and |Ojaghlou 
(2019) claimed that a growing tourism will causes the de-industralization, whereas (2008) 


found that there is no evident negative impacts of the development of tourism on the manufacturing indus- 
try in Thailand. Besides, tourism and hospitality increase the number of available jobs 
(2016). Consequently, tourism as a great contributor to economic growth is needed to be investigated to 
reveal the dynamic mechanism of tourism developments via modelling and forecasting. The characteristic 
of development paths in tourism area has also been underexplored. 

Seasonality as a common characteristic has been widely found in tourism industry 
2003). Seasonality in tourism has traditionally been regarded as a major problem which needs to be over- 
come. attempt to provide a rational framework for the tourism seasonality by analysing the 
main characteristics of these challenges. studied the characteristics of seasonality and devel- 
oped a methodology to study this phenomenon. proposed guantitative solutions via financial 
portfolio theory to assist marketers in mitigating seasonal effects. Furthermore, in terms of guantitative 


analysis method, |Duro and Turriôn-Prats| (2019) investigated the causalities of seasonality using a mixed 


effects panel data model for the main tourist destinations in the world. /Ferrante et al.|(2018) propose a new 
index to measure the seasonality in tourism by analysing the pattern of seasonal swing including seasonal 


amplitude and similarity. This method focus on the ordinal and cyclical structures of seasonal variations. By 
adopting this approach, a strong relationship between seasonal patterns and the spatial distribution through- 
out European countries were found. compared the performance of various econometric 
time-series models in forecasting seasonal tourism demand and found that the methods of seasonality treat- 
ment, such as the pre-test for seasonal unit roots, affect the forecasting performance of the models. 
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evaluated the performance of Holt-Winters and Seasonal ARIMA models for forecasting for- 
eign tourist arrivals in India. studied ARIMA modelling seasonality in tourism 
forecasting with two settings, such as one for for modeling stochastic nonstationary seasonality and another 
for a constant seasonality with three seasonal dummies. The out-of-sample forecasting performances were 


evaluated. |Chen et al.| (2019) proposed a multiseries structural time series method with one variable to 


predict seasonal tourism demand. |Chang and Liao|(2010) adopted seasonal ARIMA model to forecast the 


monthly outbound tourism departures. argued that the simple deterministic mod- 
el with seasonal dummy variables and AR(1) disturbances has better forecast performance. 
proposed a SARIMA model with non-linear methods to forecast a seasonal tourist with a 
structural break in the data. investigated the performance of combination forecasts in 
international tourism demand. forecasted tourism demand with ARMA-based methods. 

The long memory phenomena has been widely studied in many areas, which motivates researchers 
to analyse this non-ignorable dependence between the present observation and all previous observations 
in a time series (2014). And the decay rate of this dependence can often be slower than 
exponential decay. provided a decent condition for a long memory stationary process using the 
autocorrelation function (ACF), denoted by p(7) for integers j, such that ym GN p(j) = oo. Furthermore, 


Granger and Joyeux) (1980) and (1981) proposed autoregressive fractionally integrated moving 


average (ARFIMA) model by incorporating a fractional differencing operator of certain order d (0 < d < 
1/2) with the classical autoregressive integrated moving average (ARIMA) model. To capture the real world 
cyclical phenomena with long-range dependence, and adopted the 
seasonal autoregressive fractionally integrated moving-average (SARFIMA) model. Moveover, to describe 
the long memory with a oscillatory pattern in many fields, extended ARFIMA model to 
Gegenbauer ARMA (GARMA) model by introducing Gegenbauer polynomial. To overcome the difficulties 
of applying the classical time series models to model, forecast and analyze time series in the form of 
counts, constructed generalised linear ARMA (GLARMA) model by modifying the 
linear predictor into a ARMA time series structure in a generalised linear regression model. With the 
prevalence of seasonal discrete time series in many areas such as biology, finance and engineering, 
extended GLARMA by replacing the ARMA structure in the linear predictor by Gegenbauer 
ARMA structure. 


1.1 Contribution and structure 


From a modelling perspective, our first contribution is to propose generalised linear regression GARMA 
(GLRGARMA) model and generalised linear regression SARMA (GLRSARMA) model with a innovative 
function of explanatory variables in order to extend GLGARMA to incorporate relevant information for 
model fitting and forecast in tourism area. Besides, the generalised Poisson (GP) distribution is adopted 
to accommodate over- equal- and under-dispersion for certain tourism data. Moreover, the performance of 
GLRGARMA model and GLRSARMA model with their nested sub-models are compared and evaluated 
using several well-known selection criteria. 

Our second contribution is to investigate the behaviour of tourism data. The pattern of long memory 
is examined. The analysis of Hurst exponent, ACF plot and periodogram plot shows that Gegenbauer 
long memory features are presented in tourism data. Furthermore, the distinct characteristics between 
Gegenbauer long memory and seasonality are demonstrated to reveal the that the GLRGARMA model is 
more suitable for modelling tourism data. 

Our third contribution is to derive a Bayesian approach via the efficient and user-friendly Rst an pack- 
age in estimating our proposed models. For ML approach, the likelihood function is untractable because 
of involving very high dimensional integrals. Several monitors of convergence of posterior samples are 
discussed, such as the number of effective sample and R estimate. The criteria for modelling performance 
are also derived. 

The rest of the paper is organised as follows. Section 2 reviews GLMs in the insurance area and long 


memory time series models for discrete data. The long memory stochastic period effect model is introduced 
in terms of mean function for eight sub-models and their reduced forms. Section 3 derives the Bayesian 
approach of our proposed models with various selection criteria. Section 4 conducts a series of simulation 
studies to validate the accuracy of the estimation techniques we develop for the proposed models. Section 5 
presents the in-sample fitting of mortality data, out-of-sample forecast and life table construction. Section 
6 concludes the paper. 


2 Models 


2.1 Long memory time series models with Gaussian innovation 


In time series settings, the terms of long memory refers to the strength of statistical dependence, extended 
temporal dependence or persistence between lagged observations in a time series. And the rate that such 
lagged dependency decreases should be slower than exponential decay, which is the main feature in the long 


memory structure time series (Graves et al. 2014). The Wold representation introduced by |Wold| (1938) 
y 


states that 


Theorem 2.1. Any zero-mean nondeterministic covariance-stationary process Y,cti23,.,ry can be ex- 
pressed as 


Yi ct ymu =W(B)e, +c, €; ~ WN(0,07), (1) 
j—0 
where £; and vb; are uniquely defined and satisfy Wo = 1, Yoo; < oo, E(ei) = 0, Ele?) = 03, 
E(ei€,) = O, Vt, s, the coefficients {c,;t € Z} is a deterministic term with E(c;,,€,) = O, Vt, s, WN stands 
for white noise. 


|| = 


Given a stationary time series process Yi.r = (Y1, Y2,..., Yr) which admits Wold representation, with 
Yi.r € (NU (0))7, [Beran] (1994) defined a condition for a long memory stationary process in terms of the 
divergence of the autocorrelation function (ACF) for Y; and Y;,,; at lag j, such that 


Z Cov(Y;, Y. 
lim y lol(j)| > œ where p(j) = ovi in) ; 
Ba v Var(Y,) Var(Yi4;) 


jaan 


According to the Szegé-Kolmogorov formula, the spectral density f,() can be derived by taking the Fourier 
transform of autocovariance functions yọ(l — j) = (To); and I7(A;,) can be derived by taking Discrete 
Fourier Transformation (9(-)) of Y,c(1,23,..,ry where Ay = Prk fork =1—j =1,---, [4] , |] represents 
the integer part and only half of freguencies are enough to demonstrate the features because of symmetry. 
The spectral density is given by 


FA) a [ yo(k) exp(—iAk)dk, =n < A<. 


= 9n “ 


In practice, the periodogram 7r(A) is usually employed as a estimator of the spectral density f.(A). 
(1995) stated that /'r(A) is unbiased but inconsistent estimator of f,(Ax) for a Gaussian white noise 


process Y,c(1,23...,ry. The periodogram is given by 


1 
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Y(Ax) = X sin(jAr)Y;. 


There are two typical types of long memory model structure, the autoregressive fractionally integrated 
moving average (ARFIMA) model class and the generalised form of ARFIMA called Gegenbauer autore- 
gressive integrated moving average (GARMA) model. and 
extended the classical ARIMA model to the ARFIMA model, which describe a long memory stationary 
process with integrated order d € (0,1/2). further extended to a generalised ARFIMA 
model called Gegenbauer ARMA (GARMA) model, which can model data with an oscillatory damping au- 
tocorrelation function (ACF) because the representation of long memory time series structure of GARMA 
model is expressed naturally in terms of Gegenbauer polynomials. 

For d € (0, 1/2), the ARFIMA model exhibit long memory features. The short memory ARMA model 
is a special case of ARFIMA model with d = 0 where the long memory operator (1 — B) 24 = 1. 


Definition 2.1 (ARFIMA). Consider a stationary time series process with constant c € R, an ARFIMA 
model with order (p, d, q) is defined by 


9(B)(Y, —c) = O(B)(1 — B) "er, e, "ÈT N(0,02), 


where the long memory operator in ARFIMA model can be represented as 


— - PG F da) j = j . 
(1- B) * =X ~~  pi— Y 9B! with da = 2d. 


where B is the backshift operator, such that BY, = Y;—ı and 
6(B)=1-—¢4,B-—---—¢,B? and 0(B) =14+0,B+---+6,B%, 
are the autoregressive and moving-average characteristic polynomials, respectively with no common roots. 


For d € (0,1/2) and u € (—1,1), the GARMA model exhibits a long memory with an oscillatory 


pattern. The ARFIMA model is the specaial case of GARMA model with u = 1 (see (Granger and Joyeux 
1980)) such that the factor (1 — 2uB + B?)~4 


Definition 2.2 (GARMA). Consider a stationary time series process with constant c € R, aGARMA model 
with order (p, d, g) is defined by 


9(B)(Y; —c) = O(B)(1 — 2uB + B?) e, = O(B) (>: wes) , & RA N(0, 02), 


where ®(B) and ©(B) are defined in Equations (2.1). The Gegenbauer long memory operator in GARMA 
model can be represented as 


(1 — 2uB + B’)? = y ej, 
j=0 


and w,; denote the coefficients of the generating function for the Gegenbauer polynomials (1 — 2u B + B 


1971). These coefficients are formulated as 


[i/2] 7 | 
(—1)1(2u)72T(d — q + j) 
y q\(j — 2g)!T(d) 


Yj = 


q=0 
where |j /2] represents the integral part of j /2. 


The coefficients y; in Equation are functionally dependent on d that controls the strength of long mem- 
ory and the Gegenbauer parameter u that controls the oscillation of ACF (Rainville; 1960). Furthermore, 
the coefficients 7; can be easily computed using the recursive formula: 


pj = 2u (— + i) Pj- — (<= + i) Wj-2, 


where the first three terms are Wp = 1, Yı = 2du and  — —d + 2d(1 + d)u?. 
Furthermore, the bounds for the coefficients in the Gegenbauer polynomials p; are the coefficients of 
ARFIMA g; 
(2d); 
lvl < UN = 


which are demonstrated in (2017). 


For a long memory process, investigating the behaviours of ACF with different values of d and u helps 
to understand the characteristic of long memory time series. In particular, when u = —1 and 0 < d < 1/4, 
the ACF can be written as 


jo 


T1 — 2d)T(j + 2) 
T(2d)T(j — 2d + 1) 


~ constant - (—1)4j44"1, as j — oo. 


PG) = (-1) 
For ARFIMA(0, d, 0) with u = 1 and 0 < d < 1/4, it is given asymptotically by, 


T(1—2d)T(j + 2d 
pli) = l E E ~ constant - j* 


ii ; 
T0d)T(G — 2d4 1) RA 


In addition, we note that the closed form ACF for the GARMA (0, d, 0) model with the constraint given by 


lu| < 1,0 < d < 1/2 is unavailable (Woodward et al., 1998) but is asymptotically given by 
p(j) ~ constant - j°% } sin(md — jào), as j — oo, 


where Ag = cos '(u). For different values of d and u, the ACF plots of long memory time series show 
different patterns. 


2.2 long memory models for time series of counts: GLGARMA 


To model discrete time series with Gegenbauer long memory features, {Yan et al.|(2017) extended the GLM 
to generalised linear GARMA model (GLGARMA) model which combines the GLM and the GARMA 
time series model. Hence, the GLGARMA model consisting of an observation variable and state variables 


is a generalised state-space model for a time series Yi.r = (Y1, Y2,..., Yr) . The canonical log link is 
adopted as the mean of a count distribution to ensure a strictly positive intensity. 


Definition 2.3 (GLGARMA model). The general model for Y te{0,1,2,...} of order (p, d, g) is defined as 


YilZ1iu—i; X14 â D(a, v), 
(1 — 2uB + B?)12(B)(In pu — c — bX;) = O(B)es, 
iid 
= f ~ N(0,07) for PD, (2) 


— Yi-1—Mt—1 
er = a for OD, 


D(-,-) represents the Poisson, NB, GP or DP distributions. The dispersion parameter is defined as 

€ (—1,1) for GP distribution and v > 0 for NB and DP distributions. ®(B) and O(B) are defined in 

Equations (2.1). The Gegenbauer parameter |u| < 1 controls the pattern of oscillation and the long memory 
parameter 0 < d < 1/2 determines the strength of long memory. 

The data filtration F;.,-; corresponds to observed realizations of time series up to time t — 1. It is the 
process of choosing a subset of the data set for analysis. So natural sigma algebra F;.,_; for the observed 
data filtration means the analysis requires any subset of a given data (Y1, Yo,--- , Yri) (Yi € (NU {0})) 
including empty set. 

For the error terms in the mean function 4, there are two definitions under the parametric driven (PD) 
and observation driven (OD) modelling approaches and they are defined in equation (2). These state equa- 
tions combined with the structural equation in (2) form the structure of a class of count valued state space 
models. The errors of PD model follow a normal distribution which together with the data distributions, 
Poisson, negative binomial (NB), double Poisson (DP) and generalised Poisson (GP), constitute two sources 
of randomness whereas the errors of OD model are defined in terms of observations Y; and mean functions 
Ht- 

For the data distributions, the Poisson and NB are well-unknown. GP distribution has the pmf, mean 
and variance given by 


fyn Me, V) = hll v) v) + vy ie md" wyl, ue > 0, -1<v<l, 


EY) =u and Var(Y;) = pu(l — vy, 


respectively. Furthermore, the GP distribution is over-, under- and equi-dispersed when v is greater than, 
less than and equal to O respectively. On the other hand, the pmf for the DP distribution is given by 


f (yu ta, v) = c(P, h) f (Yt eV). pe > 0, >0, 


where the normalizing term c(v, p+), given by |Efron| (1986), is 
< l-v 1 

= defi Yes MeV) 14 2 (1 | i 
y=0 e 


Because of the complicated structure of c(v, u+), some properties of the DP distribution are difficult to 
derive. The unnormalised pmf, mean and variance are given by 


o e yt e vy 
imate (28) (2) 
ye Yt 
E(Y:) © pe and  Var(Y,) ~ E, 


V, pL 


respectively. The DP distribution is over-, under- and equi-dispersed when v is less than, greater than and 
equal to 1 respectively. 


2.3 Generalised linear regression models with periodic features 


The results from/Yan et al.|(2017) shows that generalised Poisson is consistently the best distribution choice 
compared with Poisson, NB and DP distributions. Furthermore, PD type models with higher flexibilities 
is often better than OD type models in both in-sample fitting and put-sample forecast. In terms of model 
structure, GLGARMA type model outperforms other types model including generalised autoregressive s- 
core (GAS) model (Creal et al.|/2013), autoregressive conditional Poisson (ACP) model 
and GLARFIMA model in both short and long memory data with or without obvious 
seasonal periodicity. In this study, since our models are extended from GLGARMA types model, basing 
on the conclusions from [Yan et al.| (2017), only PD type error term and GP distribution with Gegenbauer 
and seasonal periodic process are considered. Hence, generalised linear regression GARMA (GLRGAR- 
MA) model and generalised linear regression SARMA (GLRSARMA) model are proposed to capture the 
periodic sponge effect in a time series. 


Definition 2.4 (GLRGARMA model). Given a discrete stationary time series process and a natural o- 
algebra for the observed data filtration Fy = o(Yi, Ys," , Yri) witht € [1,7], a GLGARMA model 
with order (p, d, g) is defined by 


Yil Fit-1, Xiz Px GP( ut, v), 
e(B)ln(m) = c+6g(X.)+ (1—2uB+ B^?) O(B)e, 


e MU N(0,07) 


where g(X+) € R" are the function of explanatory variables representing the entire feature state at 
time t with regressors X; , for t € [1,7] and w € [1, W]. 8 € R is a parameter vector that describing the 
relationship between Y and X. 

In addition to the model properties and periodic specification, namely, the seasonal difference. 


Definition 2.5 (GLRSARMA model). The GLSARIMA model with order (p, s, g) is given by 


Yil7 iai, Xit ms GP( Lu, V), 
®(B)(1—aB*) Inu, = c+ 6g(Xt)+ O(B)(1 — yB*)€; 
e ~ N(0,07) 


where (1 — B*) is the standard integer seasonal difference operator with the integer seasonal period S 
that defines the frequency of the seasonal pattern. 


2.3.1 ACF oscillation in Gagenbauer fractional differences versus integer and fractional seasonal 
difference SARFIMA models. 


The differences between the SARIMA model and GARMA model are demonstrated by 
and clarified that there exists a clear distinct feature between seasonal oscillation and the oscillation that 
comes from a Gegenbauer long memory process. Their results further showed that the deseasonalisation 
cannot remove the oscillating behavior in the Gegenbauer long memory time series. The first row in Figure 
shows the time series plot simulated by SARMA (left panel) and GARMA (right panel). For SARMA 
model, the period is set to 12 which is agree with our real world data (1 year period for monthly data). 
For GARMA data set, the long memory parameter is d = 0.49 and Gegenbauer parameter isu = 0.7, 
which is similar to the period of SARMA. According to the time series, the tendencies of SARMA and 


GARMA looks very similar. It is hardly to distinguish the nature between these two series. The ACF plots 
(second row) and periodogram (last row) plots are provided to analysis the fundamental characteristics 
between SARMA and GARMA. The ACF plot for SARMA shows damped periodic peaks with overall 
short memory pattern. For GARMA, the ACF plot shows a typical Gegenbauer long memory ACF. There 
exists an oscillated long memory pattern. The periodogram plot as an representation of ACF in frequency 
domain can easily reveal the differences between SARMA and GARMA. The peaks for SARMA model in 
periodogram plot are allocated in several places. These peaks must be located in 0, 7/2 and 7 because the 
number of peaks represents the period, which can be regarded that these peaks chop the region [0, 7r] into 
12 pieces. For GARMA model, the location of the peak represents the period of Gegenbauer long memory 
process, which can be interpret into A = cos~'(w) where A is the location of the peak. 
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Figure 1: Simulation studies for SARMA and GARMA models 


2.3.2 Sub-models. 


Considering some special cases of 8g(X+), the GLRGARMA model and GLRSARMA model can be fur- 
ther divided into 8 different sub-models with their mean functions defined in Table [I] 


Table 1: 10 different sub-models where model 1,3,5,6 are GLRGARMA models, model 2,7,9,10 are GLR- 
SARMA models. 


Model mean function 

model 1 In p = Bo + 61X1 + (1 — 2uB + B^?) dg, 

model 2 In ug = Bo + b1Xı + B2X2 + (1 — 2uB + B^?) de, 
model 3 In uy = (1 — 2uB + B?)-4X +e, 

model 4 In py = (1 — 2uB + B?)-4X + (1 — 2uB + B?)~4e, 
model 5 In ju = bo + b1Xı + aln us + (1 — yB*)€ 

model 6 In ut = B1.X1  D2X2 + ln us + (1 — B*)€; 

model 7 Ing, = (1 — ôB*)X + aln us + E: 

model 8 In pe = (1 — 6B*)X + aln p-s + (1 — yB*)€ 


3 Bayesian inference 


In this study, we adopt Bayesian inference that Bayes theorem is used to implement in-sample fitting and 
out-sample forecast because apriori beliefs on model structures can be incorporated into the state space 
structures. In addition, the computation complexities in evaluating the marginal likelihood function for 
PD model which involve latent values and high-dimensional integration can be avoided (2017). 
In terms of forecast, posterior predictive distributions can be obtained to provide distributional forecast 
summaries, which helps to gain a deeper understanding on the model features. For example, the creditable 
intervals for all parameters can be easily constructed basing on the posterior distributions. 

Given a set of observed values yir = (y1, Y2,- --, Yr), with each y; € Z* U {0} and the vector of 
unknown parameters ®*, the posterior distribution for P* conditional on 4y1.r is given by 


nwn gf y ti 


which is proportional to the likelihood function f(y1-r|¥*) and the prior densities 7(*) estimated using 
available information or past data. Even if there is no prior information, we can still apply non-informative 
or reference priors. 


3.1 Bayesian model 


Let 9* = (V, z) denote the vector of all model parameter 3 = (c, u, d, 0;, $j, o°, V) and state parameter 
E€ = Er = (€1,£2,:** ,€r), with each e; € R. For demonstrating purpose, both GLRGARMA model 
and GLRSARMA model with simple regression structure 6X;, p = 0 for ®(B) and q = 0 for O(B) are 
proposed as examples. Hence, for GLRGARMA model, the latent process is ln(J4) = c + 6(X;) + (1 — 
2uB + B?)~4e, and the set of model parameter is Y = (c, u, d, 8,07, v). 

The priors 7(#) are defined as: 


u ~ U(-1,1), d~U(0,1/2), c~ N(0,02), 
Be N(0,03), o° ~T(a,b) and v ~ U(-1,1), 
are adopted in which U (a„, bu) denotes the uniform priors on the range (ay, bu) for the long memory param- 


eters u, d and v and T (a, b) denotes the gamma prior with shape and scale parameters a and b respectively 
for the scale parameter o?. The joint posterior distribution for the GLRGARMA(0, d, 0) model with GP 


data distribution is 


FO lyr, eur) = f(yur|€ur, eur, 9) f(€ur|0)n(0) 
x ll exp(c + (X) + S725 Vi€&i-i)(l — v)[exple + BO) + Ry VE) (1 —v) + ye] 
a I (yr 1) exp —yw — exp(c + (Xz) + Doe Vi&-;)(1 — v) 


1 aN ( BN naa 
or oi Jesr(-)-(- a (g?)* te GD91,(—1, 1)74(0,0.5)Z,(—1, 1), 


where the hyperparameters are set to be o2 = 03 = 10, a = 3 and b = 1. 
For GLRSARMA model, the latent process is (1 — B*) ln ju = c+ bg(Xt) + (1 — B*)e; and the set of 
model parameter is V = (c, 3,07, v). The priors m (9) are defined as: 


c~ N(0,02), 8 N(0,1), y~N(0,1), a N(0,1), o^ T(a,b) and v ~ U(-1,1), 
The joint posterior distribution for the GLRSARMA(0, s, 0) model with GP data distribution is 


FF yrr, tir) = f (yur|€ur, trr, 9) f (€ur|9)n(9) 


. i p exp(c + B(X;) + aln m-s + (1 —yB*)e.-)(l —») 
T (ys + 1) exp —yy — exp(c-r PLAX) + aln us + (1 —yB?)e ;)1.— v) 


x[exp(e + (Xi) + on py-s + (1 — yB*)ei) (1—v) + yoo] 


1 e? e P -i (edb 
= exp -=+ — a-i =T (1, 1). 
< ef ss Jesn( <a) (- 202 ey ( ) 


3.2 Bayesian forecasting 


One outstanding advantage of Bayesian inference in forecasting is the construction of posterior predictive 
distributions for all forecasts. In this study, m-step forecasts YT+1:T+m are constructed via a sequence of 


1-step ahead forecasts of yrs, s = 1,...,m using sliding window where the observed data filtration for 
each window is F,.71,—; and 21.7. The posterior predictive distribution for yr+s, s = 1,...,m is defined 
as 


FYr+s|Fs:T+s-1, pics) 


=}. : J flrsslirss Ò, F sT i-i} Ts:T+s)f(UT+s|Hs:T+s-1; v, FT; era) (O| Fs:T+s-1, Bete) dpts:T+sd¥, 


and this integral can be approximated by the Monte Carlo estimator, constructed from posterior samples 
according to: 


L 

A 1 l 

F(Yr+s|Fs:T+s-1, dp) — y y» f( (yra |d 9D, aan Ts:T+s)- 
l=1 


tS 


In this study, we set L = 90, 000 as the number of iterations after burn-in in each MCMC sampler run given 
the current window of information F,.7,,—; and £s:r+s. In addition, u®, +s and 9) are the /-th draw in the 
posterior sample of Mr» and 7, respectively. For Bayesian inference, except of the posterior predictive 
distribution, the various posterior predictive point estimators and predictive credible intervals can also be 
obtained. Empirical Bayes forecasts as a typical forecast estimator in the Bayesian setting can improve 
the computational efficiency (?). Moreover, the features of frequentist and empirical Bayes forecasts are 
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compared by ?. For empirical Bayes forecasts, the calculations undertake condition upon selected posterior 
(in sample) point estimators denoted by V and fs:r+s rather than integrating out posterior (in sample) 
parameter uncertainty from the predictive distribution and resultant forecast estimators (Yan et al., 2017). 


FEP yn yu) = F(YT+s|fs:T+8; Ô,, F seat) 


Typically the point estimators used in 0, (similarly for fts:r+s) are either formed from the maximum-a- 
posteriori estimate (MAP) or the estimate which minimizes the posterior expected loss. The concept of 
MAP is similar to the ML estimate when the priors are uninformative since in this case 9, is the mode of 
the posterior distribution 


0, wap = arg max fa, (O |F sT4s—1)- 


s 


Another Bayes estimator which minimises the posterior expected loss (PEL) is defined as 


Ü PEL = arg min E(L(0,, D5|Fe:r4s—1))s 
9 


s 


where L(0,, Ü, |Fs:T+s-1) is the loss function. One example is the commonly used minimum mean square 
error (MSE) estimator defined as 


Üs MSE = arg min E([V, — O]? |Fs:T4s-1), 
where GA mseg corresponds to the posterior mean E(9,|F::r}s-1) = 5. If the minimum absolute error 
(AE) estimator L(0,, 0.) = lð, — 9, 


is used, it gives V, Ag =  o.5 which is the posterior median. 


3.3 Bayesian tool: implementation with Rstan 


To implement the proposed models efficiently, we choose the Bayesian R package Rstan which utilises 
the stan program within R developed in the C++ language. The sampling method applied in Rstan is 


the Hamiltonian Monte Carlo (HMC) sampler (see (1987), [1994)) that belongs to the 
class of Markov chain Monte Carlo (MCMC) sampling methods (see |1953)). For some 
complicated models with many parameters, the HMC sampler converges faster than the conventional sam- 
plers such as random-walk Metropolis (see (Metropolis et al.) 1953)) and Gibbs sampler (see 
[1984)) because the HMC sampler transforms the simulation of high-dimensional target distribu- 
tion into the simulation of Hamiltonian dynamics (see [2011)). By using the gradient information, 
Hamiltonian dynamics can produce distant proposals for the Metropolis algorithm, thereby avoiding the 
slow exploration of the state space that results from the diffusive behaviour of random-walk proposals. 

For the HMC sampler, to assess the dependence, precision and convergence of posterior sample, three 
measures are reported in Rstan. The first measure is the number of effective samples which indicates 
dependence within a Monte Carlo sample. The second measure is the Monte Carlo standard error (MCSE) 


MCSE — posterior standard deviation 


number of effective samples’ 


which reports the error of estimation for the posterior mean. To monitor the convergence for k > 2 chains 


of length 2n each, |Gelman and Rubin|(1992) proposed R which is defined as 


df 


a: 
BH ap =o" 


where 


9. 9 y2 172 
Banya y= O) 


[Cov(s?, 3, ) — 20. Cov(s?, 0], (4) 


s; is the within-chain variance and V;; is the j-th parameter in chain 7. If R is close to 1, the parameter 0 
has converged. 

In this study, the number of chain is k = 1 and hence B = 0 in and (4). For this chain, there are 
overall totally 100,000 iterations, with the first 10,000 iterations as burn-in. Hence, there are L = 90, 000 
subsequent iterations with thin set to 1. The values of R for each estimators and the history plot are carefully 
checked to ensure that all parameters meet the convergence condition. For all of in-sample fitting and out 
sample forecast studies, the number of effective sample ranges from 75,000 to 86,000 across parameters. 
The range of R is between 1.0000 and 1.0003, which indicates moderate dependency and clear convergence. 


3.4 Model selection and forecast performance 


The performance of each model is evaluated through a popular Bayesian model selection criteria called de- 
viance information criterion (DIC) (Spiegelhalter et al.|/2014). As a generalisation of Akaike’s Information 
Criterion (AIC), DIC can deal with models containing informative priors, such as hierarchical models. As 
the priors can effectively restrict the freedom of model parameters, the number of parameters as required in 
the calculation of AIC is generally unclear. DIC overcomes such problems by providing an estimate for the 
effective number of parameters. The DIC can be calculated using the equation 


DIC = D+ pp = 2D — D(V,), 


where the deviance is defined as D(0,) = —2ln(f(y,|9.)), D = Es,,[—2ln(/(y,|0.))] measures 
the model fit and the estimated number of parameters pp = D — D(9,) measures model complexity 
(Spiegelhalter et al. 2002). 

Consider the m-step ahead forecasts s ; given by the posterior mean or median and the observations 
Yx + With T time point and g age groups, the forecast performance can be evaluated by adopting three types 
of measures, namely residuals vr; = Ur; — e,t, percentage errors Prt = i x 100 and scaled errors €s, 
defined in Equation (7). Based on rz and p, +, three popular criteria, namely mean absolute error (MAE), 
root mean squared error (RMSE) and mean absolute percentage error (MAPE), are defined respectively 
below 


MAE- ly. LS aril RMSE = ay 
g WW g^ g 


g—l 


(ee 
and MR J 2 J pare . (5) 
z—l t=1 


However r+ ; are scale-dependent making comparison difficult. Although pz + are scale-free, they are sensi- 
tive to observations close to zero. Hence the fourth criterion we adopt is mean absolute scaled error (MASE) 
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defined as 


jf lp Z 
MASE =-->% 2 y eer (6) 
t—l 


g x=1 


making use of the scaled errors 


Tr T+t (7) 


? 


Er, TH = m 
an yp (Yx T+ — Ye T+-1l 


proposed by Hyndman and Koehler| (2006). Furthermore, similar approach can also be applied to evaluate 


estimated results Ê+ + calculated by the posterior mean or median. Hence, the residuals r$; = Hrt — [lr 


percentage errors p>, = 7 = x 100 and scaled errors 
S 
S __ Pot 
Ct = ; 


a 2. [Met — Mee 
t=2 
can be used to construct similar criteria for the u estimator, namely the MAE, RMSE, MAPE and MASE 
by using the same formulas in Equations (5) to (6). 


4 Data analysis 


Denmark is often be analysed in political-economic research area because of the lowest Gini coefficient 
(latest OECD figures from 2012), strong local companies with great competitiveness and prominent eco- 
nomic performance (2016). The data set analysed in this study is obtained from 
that is the central authority on Danish statistics. It is a state institution under the Ministry of E- 
conomic Affairs and the Interior. They collect, compile and publish statistics on the Danish society. Danish 
was a predominantly agricultural country. After the year of 1945, Denmark has significantly developed the 
industrial base and service sector. By 2017, the agriculture sector only contribute less than 2% of overall 
GDP. On the other hand, the industrial base and services contribute around 18% and 76%, respectively 
(2020). And tourism is the most important component in Denmark's tertiary industry, which can be 
a representative indicator of service sector. In this study, we adopt rented hotel room numbers in a month 
as a index to investigate the dynamic mechanism of service sector statistics. Furthermore, we use industrial 
production index (IPI) to describe the changes in the manufacturing because the IPI is a key indicator to 
measure the output of industrial economic activities. In order to describe the characteristic of agriculture in 
Denmark, we adopt the percentage of agriculture in Gross Domestic Product (GDP) as an index. Agricul- 
ture sector includes forestry, hunting, and fishing, cultivation of crops and livestock production. Besides, 
The labour market of Denmark is the freest in Europe (2020). Employers maintain a very high level 
of flexibility, which means they can hire and fire whenever they want. And on the other hand, the unem- 
ployment compensation is relatively high to guarantee a stable living standard for unemployed persons. 
The unemployed rate used in this paper is the statistic for unemployment comprise all unemployed persons 
basing on the resident population in Denmark. In addition, the statistics of electricity production is used to 
measure the industrial production activities. 


4.1 Empirical data analysis 


The Hurst exponent H also known as the index of long-range dependence was proposed by (1951). 
It is a classical self-similarity parameter that measures the long memory feature in a time series (Millen 
and Beard, 2003). Since it is robust with few assumptions about underlying system, it has been widely 
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applied to many fields (Qian and Rasheed| 2004).A value of H in the range (2, 1) indicates long memory in 
a time series, which means that a high value in the series will more likely be followed by another high value 


and such an effect is likely to maintain for a long period into the future. A value of H = 1 can indicate 
a standard Brownian motion which is a short memory process. Furthermore, There exists a relationship 
between d and H, which is then given by d = H — 0.5. Consequently, the estimator of the Hurst exponent 
H can be approximate the long memory parameter d (2017). There exist various estimators of 
H, in this section, a well-known estimator called rescaled range analysis (R/S) is adopted to implement 
following empirical studies. 

proposed the first Hurst exponent estimator using the rescaled range R/S analysis to 
measure the intensity of long-range dependence. Given a time series Y,e(1,23,... rv. the sample mean and 
the standard deviation process are given by 


T t 


rm and S,— “ y (8) 


j=l j=l 


where the mean adjusted series X; = Y; — Y'r. Then a cumulative sum series is given by Z; = y Xj 
and the cumulative range based on these sums is 


R; = Max (0, Z1,- , Za) — Min (0, 21,--- , Za). (9) 


An important proposition for the estimator of H was derived by Mandelbrot|(1975). 


Proposition 4.1. Consider a time series Y, € R and define S; and R, in Equations (8) and (9) respectively, 
then 3 C € R such that the following asymptotic property of the rescaled range R/S holds 


[R/S\(T) = => R/S: LOT®. as T 08, 


t=1 


In addition, for small sample size 7', the rescaled range R/S can also be approximated by following equation 


(Annis and Lloyd} 1976) 


ML eae Fi for T < 340 


T vaT) —j-l J 
R/S\(T) = j 
[R/S|(T) T— ma 1y NA for T > 340 


where the _ /2 term was added by (1994). The H estimate can be obtained by a simple linear 
regression 
log R/S(T) = log C + H log T. 


Hence, the definition for the estimator of H is given by the following equation 


Definition 4.1 (Estimator H by R/S). The estimator H based on the rescaled range R/S analysis is given 

by 

T (ins log R/S(t) logt) — (eas log R/S) logt) 
Ti (log i?) = Oi log t)? 

The empirical confidence interval of H given in Eguation (10] with sample size T = 2N 2002) is 


(10) 


Aris = 


(0.5 — exp(—7.33 log(log N) + 4.21), exp(—7.20 log(log N) + 4.04) + 0.5). 


In this paper, the R package called pracma is adopted to estimate the value of H adopting the R/S analysis. 
For the data set of rented hotel room numbers in a month, the estimated Corrected R over S Hurst exponent is 
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0.908, which indicates that there exist a strong long memory in this data. Furthermore, according to Figure 
that shows the ACF and periodogram plot for rented hotel room numbers, the class of long memory 
structure is a typical Gegenbauer long memory pattern with apparent oscillatory structure in ACF plot. 
For the periodogram plot, the peaks locates at non-zero position which aligns with the characteristics for 
Gegenbauer long memory type models. 


The number of rented room in Denmark 
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Figure 2: Plot of ACF and periodogram for rented hotel room numbers 


4.2 Model fitting 


In this section, Model 1 to Model 8 are adopted to fit rented hotel room numbers in a month. Industrial 
production index (IPI), power production (PG) and unemployment rate (UR) are incorporated to improve 
model feasibility. The in-sample fitting performances of seasonal component and Gegenbauer long memory 
component are compared in this study. To monitoring the convergence of Bayesian approach, the values of 
R for each estimators are between 1.0000 and 1.0003 and the number of effective sample are always more 
than 80,000. Figure[3]is an example of convergence test, which reports the MCMC sample path for several 
key parameters for Model 1. According to these plots, the model parameters are properly estimated. 
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Figure 3: Trace plots for part of Model 1 parameters 


The goodness of fit for the eight models are evaluated by using DIC to select the best fitting model for 


each data set. Table [2| reported the DIC values of these models. The performance of the models incorpo- 
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rating Gegenbauer long memory or seasonal component into error terms are significantly better than the 
models with these components on the explanatory variables X. This indicates that the historical informa- 
tion provided by the explanatory variables X will cause negative impacts on modelling performance. As a 
contrast, the error terms only keep essential characteristics of previous knowledge for modelling. Further- 
more, the performance of model with seasonal component is similar with the long memory model, since 
the differences of DIC values between both type models are very small. Consequently, for in-sample fitting 
staudy, Model 1, Model 2, Model 5 and Model 6 outperform other models. 


Table 2: DIC results. 


Model 1 X =IPI X =UR X =PG 
DIC 1285.48 825.40 1440.54 
Model 2 | X, =IPI, X;-PG X, =IPI, X;-UR | X; =UR, X5-PG 
DIC 1325.06 794.77 863.48 
Model 3 X <—IPI X =UR X —PG 
DIC 2367.663 2115.049 2510.72 
Model 4 X =IPI X =UR X —PG 
DIC 2416.19 2291.37 2516.81 
Model 5 X =IPI X =UR X =PG 
DIC 1287.67 825.87 1327.78 
Model 6 | X, =IPI, X;»-PG | X, =IPI, X»-UR | X, =UR, X,=PG 
DIC 1318.26 820.23 847.93 
Model 7 X =IPI X =UR X =PG 
DIC 2397.63 2187.52 2508.97 
Model 8 X =IPI X =UR X =PG 
DIC 2417.82 2128.53 2607.18 


where HR is short for the number of rented hotel room, PG is short for power production, and UR is short 
for unemployment rate. 


The figures show the in-sample fit performance, which confirms that there is no significant evidence to 
claim long memory model superior than seasonal model in model fitting. 
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Figure 4: In-sample fitting plot for Model 1 with X =IPI 
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Figure 5: In-sample fitting plot for Model 1 with X =PG 
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Figure 6: In-sample fitting plot for Model 1 with X =UR 
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Figure 7: In-sample fitting plot for Model 5 with X =IPI 
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Figure 9: In-sample fitting plot for Model 5 with X =UR 
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Figure 10: In-sample fitting plot for Model 2 with X1 =IPI and X2 =UR 
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Figure 11: In-sample fitting plot for Model 2 with X; —IPI and X2 =PG 
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Figure 12: In-sample fitting plot for Model 2 with X; —PG and X2 =UR 
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Figure 13: In-sample fitting plot for Model 6 with X1 —IPI and X2 =UR 
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Figure 15: In-sample fitting plot for Model 6 with X; =PG and X2 =UR 


4.3 Model forecast 


In this section, we calculate one-step ahead forecast for m = 20 time points based on the posterior pre- 
dictive distributions and the posterior sample size of L = 90,000. Only Model 1, Model 2, Model 5 and 
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Model 6 are adopted in out-sample forecasting study because these models show reasonable in-sample fit- 
ting performance. The forecasts ŷ, are given by the posterior mean or median. To evaluate the forecast 
performance, three types of measures, namely residuals r, = y; — ŷ;, percentage errors p; = 7 x 100 and 
scaled errors q, are adopted. Based on r; and p;, three popular criteria, namely the mean absolute error 
(MAE), root mean squared error (RMSE) and mean absolute percentage error (MAPE) are calculated in Ta- 
ble[3] Overall, Gegenbauer long memory models with smaller criteria values provide more accurate forecast 


results than seasonal models. Moreover, the forecast performance can be greatly improved by incorporating 
more explanatory variables. 


Table 3: Comparison of models in forecasts. 


Model 1 X —IPI X =UR X =PG 
MAE 15.43 10.71 8.08 
RMSE 16.75 12.01 9.59 
MAPE 0.16 0.11 0.09 
MASE 1.72 1.34 0.90 

Model 5 X =IPI X =UR X =PG 
MAE 18.08 12.01 12.91 
RMSE 20.57 13.16 14.89 
MAPE 0.18 0.12 0.13 
MASE 2.02 1.19 1.44 
Model 2 Xı —IPI, X2-PG 1 —IPI, X2-UR Xı =UR, Xə=PG 
MAE 6.91 9.91 7.63 
RMSE 8.02 11.27 9.17 
MAPE 0.07 0.10 0.08 
MASE 0.77 1.10 0.85 
Model 6 | X, =IPI, X.=PG | X, =IPI, X,=UR | X, =UR, X,=PG 

MAE 10.92 9.92 8.27 
RMSE 12.77 11.54 9.84 
MAPE 0.11 0.10 0.09 
MASE 1:22 1.11 0.92 


where HR is short for the number of rented hotel room, PG is short for power production, and UR is short 
for unemployment rate. 


Figures below show the forecast results, which agree with Table B] Gegenbauer long memory type mod- 
el should be the best choice in dealing with the number of rented hotel room data with Gegenbauer long 
memory features. Seasonal models cannot replace Gegenbauer long memory model since some fundamen- 


tal features cannot be capture by seasonal component. 
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Out-sample forecast results by GLRGARMA model 
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Figure 16: Out-sample forecasting plot for Model 1 with X =IPI 
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Figure 17: Out-sample forecasting plot for Model 1 with X =PG 
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Out-sample forecast results by GLRGARMA model 
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Figure 18: Out-sample forecasting plot for Model 5 with X =UR 
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Figure 19: Out-sample forecasting plot for Model 5 with X =IPI 
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Out-sample forecast results by GLRSARMA model 
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Figure 20: Out-sample forecasting plot for Model 5 with X —PG 
Out-sample forecast results by GLRSARMA model 
1 — Observed data 
œ | — Forecasted data ` 
= 
o el 
o 
oo. 
oo 
£ 
Uo. 
200 
C 
Lo 
o. 
o. 
o 
07/2016 05/2017 Mr 018 01/2019 11/2019 
ime 


Figure 21: Out-sample forecasting plot for Model 5 with X =UR 
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Out-sample forecast results by GLRSARMA model 
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Figure 22: Out-sample forecasting plot for Model 2 with X; —IPI and X2 =UR 
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Figure 23: Out-sample forecasting plot for Model 2 with X; =IPI and X2 =PG 
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Out-sample forecast results by GLRSARMA model 
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Figure 24: Out-sample forecasting plot for Model 2 with Xı —PG and X2 =UR 
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Figure 25: Out-sample forecasting plot for Model 6 with X; =IPI and X2 =UR 
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Out-sample forecast results by GLRSARMA model 
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Figure 26: Out-sample forecasting plot for Model 6 with X; —IPI and X2 =PG 
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Figure 27: Out-sample forecasting plot for Model 6 with X; —PG and Xə =UR 


5 Conclusion 
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This paper proposed a generalised linear regression structure with a innovative function of explanatory 
variables. Essential relevant information for modelling can be take into the consideration via explanatory 
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variables. To capture the Gegenbauer long memory feature in a time series, seasonal and Gegenbauer 
long memory components are incorporated to enhance model feasibility. Moreover, to improve model 
flexibility, the generalised Poisson (GP) distribution with over- equal- and under-dispersion is adopted. 
Overall, eight sub-models are implemented with the number of rented hotel room data set to evaluate the 
model performance. Furthermore, for the number of rented hotel room data in tourism area, the Gegenbauer 
long memory feature is revealed. Especially, the fundamental differences between seasonal component and 
Gegenabuer long memory component are distinguished. By plotting ACF and periodogram graphs, the 
long memory pattern is investigated. Beyesian approach is applied to implement in-sample fitting and out- 
sample forecast studies. Several model selection criteria are adopted to select most feasible model. Overall, 
GLRGARMA model is evaluated to be the best model to handle the time series with Gegenbauer long 
memory feature, especially in tourism area. 
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